In a recent analyisis we reviewed the performance of every National Team in the world. However evolution of the style of game is a fact that was not contemplated. So, in this post we can visualize how in FIFA World Cup every nation is evolving at different rates.
library(tidyverse)
library(plotly)
library(lubridate)
The first thing is to read files. I downloaded this project at 2021-07-22 from Kaggle.
results <- read.csv("results.csv", encoding = "UTF-8")
This dataset contains data about \(42k+\) football matches in the history of international encounters between national teams. So, let’s take a little taste of the data:
head(results)
One interesting thing is to take a look at the context of the matches, some of them could be not relevant at all, however, there is also World cup matches, continental tournaments, and so on:
levels(as.factor(results$tournament)) -> tournaments
sample(tournaments,20)
## [1] "South Pacific Games"
## [2] "USA Cup"
## [3] "Nations Cup"
## [4] "Vietnam Independence Cup"
## [5] "CONCACAF Nations League"
## [6] "CFU Caribbean Cup"
## [7] "FIFA Arab Cup qualification"
## [8] "United Arab Emirates Friendship Tournament"
## [9] "International Cup"
## [10] "Nordic Championship"
## [11] "World Unity Cup"
## [12] "EAFF Championship"
## [13] "WAFF Championship"
## [14] "FIFA World Cup qualification"
## [15] "Gold Cup qualification"
## [16] "AFC Challenge Cup"
## [17] "Pan American Championship"
## [18] "Oceania Nations Cup qualification"
## [19] "CONCACAF Nations League qualification"
## [20] "Dynasty Cup"
Filtering by tournaments with at least 100 matches played in the history:
results %>%
group_by(tournament) %>%
summarise(count=n()) %>%
filter(count > 100) %>%
select(tournament) -> popularCups
results %>%
filter(tournament %in% popularCups$tournament) %>%
ggplot(aes(x=tournament, fill=tournament)) +
geom_bar() +
coord_flip() +
labs(title="Matches in tournaments") -> p
ggplotly(p)
Now we need to process a little bit of the data to assign a standard way to provide points based on the outcome of every match:
| Points | Outcome |
|---|---|
| \(3\) | Victory |
| \(1\) | Tie |
| \(0\) | Defeat |
In FIFA scores, 2 points can be achieved by winning a shootout after a tied match, however, I ignored that for the following analysis
Let’s take a look on how it looks now:
results %>%
mutate(tied=ifelse(home_score == away_score,TRUE,FALSE)) %>%
mutate(home_points=ifelse(tied == TRUE,1,ifelse(home_score > away_score,3,0))) %>%
mutate(away_points=ifelse(tied == TRUE,1,ifelse(home_score > away_score,0,3))) -> results
results %>%
filter(grepl("FIFA World Cup",tournament)) -> worldCupResults
head(worldCupResults)
After this step we also need to transform a little bit the structure of this dataset in order to measure the performance of each National Team in this way:
Then we can see how it looks (for tournaments that contain "FIFA World Cup" in its name).
results %>%
pivot_longer(c(home_team,away_team),names_to = "homeaway", values_to = "team") %>%
mutate(points=ifelse(grepl("home",homeaway),home_points,away_points),
goals=ifelse(grepl("home",homeaway),home_score,away_score),
receivedGoals=ifelse(grepl("home",homeaway),away_score,home_score)) %>%
select(date,tournament,country,team,points,goals,receivedGoals) -> results
results %>%
filter(grepl("FIFA World Cup",tournament)) -> worldCupResults
The most interesting matches occur at FIFA World Cup. So we can focus on what happens in this tournament:
worldCupResults %>% filter(!grepl("qualifi",tournament)) %>% mutate(yr=year(date)) %>% group_by(yr,team) %>% summarise( p=sum(points), goals=sum(goals), against=sum(receivedGoals),matches=n()) %>% mutate( performance=p/matches, ofensive=goals/matches, defense=against/matches) %>% ggplot(aes(x=yr, y=performance, fill=team)) + geom_bar(stat="identity") -> p
## `summarise()` has grouped output by 'yr'. You can override using the `.groups` argument.
ggplotly(p)
Germany emerges as the best in performance over all the matches related to the World Cup. Is not a surprise at all, remember all of the “goleadas” that has produced, in the qualifiers as well as in the knock-out matches in the final stages of the tournament.
Now we can take a look at what happens if we focus only on the final stage, I mean filtering out the qualifiers:
worldCupResults %>% filter(!grepl("qualifi",tournament)) %>% mutate(yr=year(date)) %>% group_by(yr,team) %>% summarise( p=sum(points), goals=sum(goals), against=sum(receivedGoals),matches=n()) %>% mutate( performance=p/matches, ofensive=goals/matches, defense=against/matches) %>% filter(team %in% c("Mexico","Brazil","Argentina","Germany","France")) %>% ggplot(aes(x=yr, y=performance, color=team)) + geom_line() -> p
## `summarise()` has grouped output by 'yr'. You can override using the `.groups` argument.
ggplotly(p)
worldCupResults %>% filter(!grepl("qualifi",tournament)) %>% mutate(yr=year(date)) %>% group_by(yr,team) %>% summarise( p=sum(points), goals=sum(goals), against=sum(receivedGoals),matches=n()) %>% mutate( performance=p/matches, ofensive=goals/matches, defense=against/matches) %>% mutate(differenceGoal=ofensive-defense) %>% ggplot(aes(x=yr, color=team, y=differenceGoal)) + geom_line() -> p
## `summarise()` has grouped output by 'yr'. You can override using the `.groups` argument.
ggplotly(p)